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ABSTRACT 

We have catalogued and analysed cosmological parameter determinations and their 
error bars published between the years 1990 and 2010. Our study focuses on the popu- 
larity of measurements, their precision and their accuracy. The accuracy of past mea- 
surements is gauged by comparison with the most recent WMAP results of Komatsu 
et al. (2011). The 637 measurements in our study are of 12 different parameters and 
we place the techniques used to carry them out into 12 different categories. We find 
that the popularity of parameter measurements (published measurements per year) in 
all 12 cases except for the dark energy equation of state parameter wq peaked between 
1995 and 2004. Of the individual techniques, only Baryon Oscillation measurements 
were still rising in popularity at the end of the studied time period. The quoted preci- 
sion (fractional error) of most measurements has been declining relatively slowly, with 
several parameters, such as the amplitude of mass fluctutations as and the Hubble 
constant Hq remaining close to the 10% precision level for a 10-15 year period. The 
accuracy of recent parameter measurements is generally what would be expected given 
the quoted error bars, although before the year 2000, the accuracy was significantly 
worse, consistent with an average underestimate of the error bars by a factor of 2. 
When used as complement to traditional forecasting techniques, our results suggest 
that future measurements of parameters such as fNL, and Wa will have been informed 
by the gradual improvment in understanding and treatment of systematic errors and 
are likely to be accurate. However, care must be taken to avoid the effects of confir- 
mation bias, which may be affecting recent measurements of dark energy parameters. 
For example, of the 28 measurements of 57a in our sample published since 2003, only 2 
are more than 1 a from the WMAP results. Wider use of blind analyses in cosmology 
could help to avoid this. 

Key words: Cosmology: observations 



1 INTRODUCTION 



Modern cosmological parameters have been measured since 
Hubble's (1929) discovery of the expansion of the Universe. 
The number of model parameters increased during the late 
1980s with the introduction of what is often referred to as 
the "Standard cosmological model" (e.g. Dodelson 2005). 
The idea of "Precision cosmology" emerged more recently, 
and by the present time, many of the parameters in this 
model are well known (see e.g., Komatsu et al. 2011, here- 
after WMAP7). This presents us with an interesting oppor- 
tunity: by comparing the past measurements of parameters 
and their error bars with the currently known values, we can 
evaluate how well the measurements were carried out in the 
past, how realistic the quoted uncertainties were, and which 
methods gave the most statistically reliable results. We can 
also study how both their precision and accuracy has varied 



with time. Such research will help us in our quest to make 
critical evaluations of what will be possible in the future, 
and by working with past data serves as a complement to 
more conventional future extrapolations of technology and 
techniques (e.g., the report of the Dark Energy Task Force, 
hereafter DETF, Albrecht et al. , 2006). In the present paper 
we make a first attempt at such a study, by compiling pub- 
lished parameter values taken from the NASA Astrophysics 
Data System over the years 1990-2010. 

Previous studies of cosmological parameter determina- 
tions have tended to focus on the Hubble Constant, Ho, for 
which there is a longer than 80 year baseline for analysis. 
Several papers have used the comprehensive database com- 
piled by John HuchrcQ to generate their dataset, such as the 



https:/ /www. cfa.harvard.edu/ dfabricant/huchra/ 
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study of the non-Gaussian error distribution in those mea- 
surements (Chen et al. 2003). Gott et al. (2001) used median 
statistics in a metanalysis of these Hq measurements to find 
the most probable value (and also analysed early measure- 
ments of SIa). This median statistics approach has also been 
used to combine individual estimates of Q.mO, the present 
mean mass density in non-relativistic matter by Ghen and 
Ratra (2003). In the present paper we do not seek to combine 
the measurements from various works into best determina- 
tions of parameters. Instead we start from the assumption 
that the parameters we look at have been well measured 
(and their correct values are close to the WMAP7 values) 
and see what this implies about past measurements. 

We therefore will be starting with the assumption that 
AGDM is the correct cosmological model. This should be 
borne in mind when interpreting our results. Even if the 
true cosmology turns out in the future to be something else, 
we expect that the effective values of the AGDM parame- 
ters are not likely to be very different (given the good fits 
to current data), so that our approach will have some value 
even then. Parameters which at the moment are unknown, 
or very poorly constrained, such as the non-Gaussianity pa- 
rameter fNL (e.g., Slosar et al. 2008), or the time deriva- 
tives of the dark energy equation of state parameter w (e.g., 
Ghevallier & Polarski 2001) can obviously not be studied at 
present with our approach. Instead we hope that the gen- 
eral lessons from the past about the reliability of error bars, 
methods and achievable precision and accuracy can usefully 
to inform future efforts to measure those parameters. 

The DETF report explains how four different tech- 
niques are being used and will be used in the future to 
constrain dark energy parameters. These techniques, gravi- 
tational lensing, baryon oscillations, galaxy cluster surveys 
and supernova surveys all have a history and have been in- 
volved in a large number of previous measurements of dif- 
ferent parameters. It is interesting to see how they have per- 
formed in the past, and evaluate them based on this data. 
By looking over the published record, we can also show 
how measurement precision has changed, in terms of the 
quoted fractional error bars, and see how this compares with 
predicted future trends. One can ask whether for example 
the earlier error bars were unrealistically small, so that the 
quoted precision of measurements has not changed much. 
This should have consquences for the accuracy of measure- 
ments, which we will define and measure. In general, our 
motivation for this study can be summarized by the idea 
that once cosmological parameter measurements are pub- 
lished, for the most part they are ignored when future work 
arrives. The dataset left behind can instead become a valu- 
able resource to inform future work. 

Our plan for the paper is as follows. In Section 2., we 
detail the source for the cosmogical parameter estimates and 
how the data was collated. We explain the different catego- 
rizations of measurements and methods and standardization 
that was carried out. In Section 3 we outline the steps in- 
volved in our analysis of the data, and present results in- 
cluding historical trends in some individual parameters and 
the precision and accuracy of measurements. In Section 4 
we summarize our findings and discuss our results. 



2 DATA 

We have made use of the NASA Astrophysics Data Sys- 
tem to generate our dataset by carrying out an automated 
search of publication abstracts for the years (1990-2010). 
We limited the search to published papers which include 
cosmological parameter values and their error bars in the 
paper abstract itself. It is of course possible to carry out a 
more extensive analysis by searching the main text of each 
paper, and we estimate from a random sampling that ap- 
proximately 40% of parameter estimates are missed by our 
abstract-only technique. We make the assumption that this 
does not bias our sample. The total number of parameter 
measurements in the 20 year period shown is 637. 

2.1 Parameters 

The search we use in the ADS abstract query form 
is a search for the following terms: "sigmaS" ,"H0" , 
"Omega" ," Lambda" , "m_nu"," baryon". We also restrict our 
search to the following journals: MNRAS, Astrophysical 
Journal, ApJ Letters, ApJ supplement, and Physical Re- 
view Letters. This parameter search query appears restric- 
tive, but enables results for 12 different parameters to be 
found, including associated parameters. These 12 are: 

(i) Q.M, the ratio of the present matter density to the 
critical density. 

(ii) r^A, the cosmological constant as a fraction of the crit- 
ical density, 

(iii) Hq, the Hubble constant, 

(iv) (Tg, the amplitude of mass fluctuations, 

(v) f^b, the baryon density as a fraction of the critical 
density, 

(vi) n, the primordial spectral index 

(vii) /?, equivalent to /b where b is the galaxy bias, 

(viii) m^, the neutrino mass, 

(ix) r, equivalent to nm/fo/lOO kms^^Mpc"^, 

(x) Q.'^jfas, a combination that arises in peculiar velocity 
and lensing measurements, 

(xi) Qk the curvature, 

(xii) Wo, the equation of state parameter for Dark Energy. 

The measurements are generally quoted with 1 a errors 
on the parameters but 7% have 2 a errors. In this case, in 
order to have a uniform sample, we halve the 2 a error bars. 
We have tested the effect of ignoring excluding these 7% 
of measurements on our results (Section 3.3) and flnd that 
our conclusions are insensitive to this. Some of the measure- 
ments are also quoted with separate systematic and statis- 
tical error bars (6% of the sample). In this case we sum the 
statistical and systematic errors to make a total error bar. 
We also test the effect of adding them in quadrature, or 
ignoring the systematic part altogether (see Section 3.4). 

Given that our approach is to assume that the WMAP7 
results axe correct within their quoted errors and that the 
ACDM model describes the observations well, we use the 
AC DM model values to convert combinations of published 
parameters into those listed above. For example, when mea- 
surements of Qth^ are given we convert these into a value 

^ http://adsabs.harvard.edu/ 
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Table 1. Fiducial values for the cosmological parameters used in 
this paper. These values are used when computing the accuracy of 
past measurements. All parameters arc taken from the last col- 
umn of Table 1. in the WMAP7 paper, which are mean of the 
posterior distribution of combined WMAP+BAO+J/o measure- 
ments (we have also tried the maximum likelihood parameters, 
with no difference in our results) , except parameters (ii) , (viii) , (xi) 
and (xii) for which we have assumed that an exactly flat ACDM 
model holds with = ± O.leV. The quoted error bars are de- 
rived from the WMAP7 error bars, with the exception of parame- 
ter (vii) for which an error bar of 0.1 is used to approximately take 
into account differences in galaxy bias between different samples. 
We explore the effect of adding these error bars in quadrature to 
the error bars of past measurements in Section 3.4 



Parameter 


Central value 


1 cr error bar 


(i) 


0.274 


0.013 


(ii) f^A 


1.0-0.274 


0.013 


(iii) Ho 


70.2 kms-^Mpc-i 


1.4 kms^^Mpc^i 


(iv) erg 


0.816 


0.024 


(v) ^6 


0.0458 


0.0016 


(vi) n 


0.968 


0.012 


(vii) /3 


0.460 


0.1 


(viii) rrii/ 


0.0 eV 


0.1 eV 


(ix) r 


0.193 


0.006 




0.376 


0.015 


(xi) fife 


0.0 


0.0 


(xii) WQ 


-1.0 


0.0 
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Figure 1. Scatter plot of method vs. parameter. We plot as a 
point each of the 637 published measurements, with y-axis rep- 
resenting the 12 method bins of Section 2.2 and the x-axis the 
12 parameters of Section 2.1. In order to make the points visible 
we have added a random offset of a fraction of the bin width to 
each point. The red and black colours are used solely to enhance 
differentiation between the bins. 



for fib using the WMAP7 value of ft = 0.702. For reference 
we give our fiducial values of each parameter in Table [1] As 
stated in the caption, most of these are taken from Table 1 
in WMAP7, but others are assumptions based on ACDM 
(e.g. Wo = —1 exactly). 

2.2 Measurement Methods 

For each published measurement, we also choose a category 
based on the type of data and method used to extract the 
cosmological parameter. There are obviously many different 
possible choices of categorization possible and with different 
coarseness. We choose the following 12 categories in order 
to have a reasonable number of measurements in each (the 
mean is 53): 

(i) Cosmic Microwave Background (CMB), specifically 
measurement of primary anisotropies, 

(ii) Large-Scale Structure (LSS), which includes cluster- 
ing of galaxies, galaxy clusters (BAO measurements and red- 
shift distortions are considered separately), the Lya forest, 
quasar absorption lines, and quasars. 

(iii) Peculiar velocities, which includes measurements of 
galaxy peculiar velocities inferred from distance measure- 
ments and redshifts, and the cosmic dipole, 

(iv) Supernovae, which includes techniques that use su- 
pernova distance measurements. 

(v) Lensing, which includes constraints from the number 
of strong gravitational lenses, weak lensing shear, and grav- 
itational lens time delay, 

(vi) Big Bang Nucleosynthesis (BBN), 



(vii) Clusters of galaxies including their abundance and 
their masses. Includes Sunyaev-Zeldovich measurements, 

(viii) Baryonic Acoustic Oscillation measurements from 
large-scale structure of galaxies and clusters, 

(ix) The Integrated Sachs Wolfe effect (ISW), 

(x) z distortions, redshift distortions of clustering 

(xi) Other, includes TuUy Fisher distance estimates, 
galaxy ages and/or colours, globular cluster distances, inter- 
nal structure of galaxies, cepheid distances, surface bright- 
ness fluctuations, reverberation mapping, radio source size, 
and Gamma Ray Burst distances, 

(xii) Combined, includes measurements that result from 
a combination of techniques or past measurements, without 
the addition of new measurements. 

In Figure[T]we show a scatter plot of method vs param- 
eter for our dataset. We can see that the most popular pa- 
rameter/method combination is Qm measured using galaxy 
clusters, but that in general there is a fairly wide selection of 
method and parameter, with just over half (76 out of 144) of 
the combinations covered by at least one published abstract. 



3 ANALYSIS 

Our analysis is in two parts, the first being a study of gen- 
eral trends in the number of parameter measurements and 
popularity of different methods by year, as well as a looking 
at the measurement value vs year for a subset of param- 
eters (Sections 3.1 and 3.2). In the second part (Sections 
3.3 and 3.4), we compute the precision and accuracy of the 
measurements and see how these have varied with time. 
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1990 1995 2000 2005 2010 2015 



year 

Figure 2. Number of cosmological parameter measurements published per year, witli curves representing tiic 12 different parameters 
listed in Section 2.1. Bins of width 3 years were used to compute the curves. 



3.1 Number of studies by year 

In Figure[2]we show how the number of parameter measure- 
ments per year has varied with time. The results are shown 
averaged in bins of 3 years. 

It is immediately noticeable that nearly all of the pa- 
rameters have a peak in the number of measurements around 
the years 2000-2003, and then a decline in the post-WMAPl 
(Spergel et al. 2003) era. Exceptions to this are measure- 
ments of wq, which are still increasing in number, and con- 
strains on rriu. Of course this historical trend is largely guar- 
anteed by our selection of the parameter set we have chosen, 
which in large part are considered to have been well mea- 
sured already. Other parameters such as fNL, Wa, or the 
modified gravity parameter Eg (see e.g., Reyes et al. 2011) 
would still be increasing on such a plot. Figure [2] can also 
be viewed as a measure of the extent to which parameters 
are considered to be well measured. For individual parame- 
ters such as (Tg , there are still many measurements published 
even at the current time, but the decline is still there. 

Another way to present the data is shown in Figure O 
where the popularity of different methods with time can be 
examined. Here it can be seen that "combined" methods 
are the exception to the general post WMAPl decline. In 
overal number, galaxy clusters have proven the most popu- 
lar cosmological probe, with a sharp start in the early 1990s. 
Supernovae and Large-Scale structure measurements have 
remained fairly constant since 2000, and the popularity of 
gravitational lensing per year has not been much different 
from that of galaxy clusters, except lagging behind by about 
8 years. BAO measurements are the only technique still on 
the rise, reflecting the current and future large-scale struc- 
ture surveys targeted at BAO (e.g., Eisenstein et al. 2011, 
Blake et al. 2011, Schlegel et al. 2011). 



3.2 History of individual parameter 
measurements 

It is instructive to study the distribution of data points and 
their error bars as a function of time, and we do this for a 
subset of parameters in Figures [4] through [9] In each case 
we show the WMAP7 best fit value for the parameter as a 
horizontal line. This type of plot is most familiar from the 
studies of Huchra for the Hubble constant, where the initial 
values reported by Hubble were over 5 times the currently 
accepted values. 

In Figure U we show how Hubble constant determina- 
tions have changed over the last 20 years, with the begin- 
ning of this time period overlapping with the end of the 
~ 20 year timeframe during which measurements of Hq 
were largely divided into two groups, one group closer to 
50 km s~^ Mpc~^ (e-g-, Sandage & Tammann 1975), and 
one closer to 100 km s^^ Mpc^^ (e.g., Devaucouleurs et al. 
1979). These two camps can be seen prior to 1995 in Figure 
21 where it is also obvious that their error bars are largely 
not compatible, or indeed compatible with the eventual cur- 
rently favoured value of Ho = 70 km s"^ Mpc"^ The HST 
Key Project (hereafter KP) to measure the extragalactic 
distance scale published its first results in 1994 (Freedman 
et al. 1994), and final results in 2001 (Freedman et al. 2001). 
The main contribution of the project was to extend the 
Cepheid-based rung of the distance ladder to cosmological 
distances. Freedman et al. 2001 combined this with other 
datasets (Type lA and type II SN, the galaxy TuUy-Fisher 
relation, surface brightness fluctuations and galaxy funda- 
mental plane) to yield different measurements which were 
all consistent with Hq = 72 ± 8 km s~^ Mpc~^, meeting the 
goal of a ~ 10% measurement of Hq. 

This post- 1994 period of activity related to the KP is 
immediately apparent in Figure [l] It can also be seen that 
different methods have produced results which were some- 
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Figure 3. Number of cosmological parameter measurements published per year, witli curves representing the 13 different measurement 
methods listed in Section 2.1. Bins of width 3 years were used to compute the curves. 




year 



Figure 4. Individual published values of the Hubble constant, Ho as a function of year. We show one sigma error bars on the points, 
and the point colour (shown in the legend) denotes the technique used to make the measurement (see Section 2.2 for more details). 



what divergent at first but which eventually became consis- 
tent with the final resuh by the end of the 1994-2001 KP 
period. An example of this is the determination from type lA 
supernovae, where it can be seen that the green points rep- 
resenting these track steadily upwards from 1993 onwards. 
A large cluster of gravitiational lensing time delay measure- 
ments also exhibits a similar trend, and indeed some other 
measures such as galaxy cluster Sunyaev-Zeldovich measure- 
ments are somewhat lower than Ho = 70 km s~^ Mpc~^. 
This set of lower results largely disappears by 2003, which is 



when the next sudden tightening of determinations occurs, 
concident with the WMAPl data release. The WMAPl best 
fit value of Ho was 72 ±5 km s~^ Mpc~^, and after that date 
essentially all measurements are consistent with it. Of course 
the WMAPl result was strikingly similar to the KP result 
even though it involved radically different physics. The ev- 
idence of Figure U is that the combination of the two sets 
of measurements was enough to convince most researchers 
that the measurement goal had been reached. In the future, 
a measurement of Ho to even higher accuracy will be needed 
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year 

Figure 5. Individual published values of the density parameter, Qm as a function of year. We show one sigma error bars on the points, 
and the point colour (shown in the legend) denotes the technique used to make the measurement (see section 2.2 for more details). 1 (2) 
sigma upper and lower limits arc shown using single (double) arrows. 



to make truly accurate constraints on dark energy parame- 
ters (see e.g. the DETF report). 

There are a few obviously discordant points, for exam- 
ple, Leith et al. (2008) find Ho = &l-7tli km s"^ Mpc'^ 
from a combined analysis of several datasets. Their analy- 
sis is not in the context of the LCDM model, but in one in 
which cosmological averaging can be used to understand the 
acceleration of the Universe (Wiltshire 2007). 

In Figure[5]we show the history of measurements of Qm, 
the most frequently measured parameter in our dataset. In 
this case we can see that before 1999 approximately 1/3 
of the measurements were consistent with high values of 
f2m ~ 0.5 — 1.0, and that the most popular technique in 
this early period involved the use of galaxy peculiar veloc- 
ities. The error bars were large, although there are several 
points which are not consistent with the eventual WMAP7 
value of = 0.274 ± 0.013. After 1999, although pecuhar 
velocities continued to be popular, the measurements were 
no longer sampling the high fim end of parameter space. As 
with the Ho results a second significant tightening of pub- 
lished values around the final range took place in the years 
2004-2005, shortly after the WMAPl results. 

The amplitude of mass fluctuations, erg is examined in 
Figure [6] In this case we can see that the abundance of 
galaxy clusters is easily the most popular method used to 
measure this parameter, and the effort started in earnest 
around 1995. The cluster measurements of erg are roughly 
evenly spread around the WMAP7 value of erg = 0.816 ± 
0.024 until after the WMAPl release, when low values (be- 
low (Tg ~ 0.8) ceased to be published. As with the other pa- 
rameters, the evidence of post WMAPl tightening is there. 
Lensing determinations of erg seemed to favour high values, 
erg ~ 1 until after WMAPl. 

Turning to the baryon density parameter Qi, in Figure 
[T] we can see that the measurements are mainly concen- 



trated in an 8 year period between 1996 and 2004. Over this 
time span two features can be clearly seen, the first being 
the steady rise in fli, measured using BBN, and other be- 
ing the start of CMB measurements around 2000. Because 
a high Deuterium to Hydrogen ratio (easier to see) implies 
a low value of f2i, this may account for the difficulty encoun- 
tered in early BBN measurements. Both CMB and BBN 
were consistent, however well before the WMAP tighening 
which occured around 2003-2004, as with the other param- 
eters. 

In Figure [8] we plot the measurements of Qa- In this 
case, many of the early points are upper limits which were 
just consistent with the eventually measured value. The first 
Type lA supernova results showing acceleration appeared at 
the end of this era of upper limits. The probably WMAP- 
related tightening of results around 2003 is especially pro- 
nounced in this plot, where one can see the published error 
bars sizes immediately dropping. It is interesting to note 
that after 2002, almost all measurements of JIa are con- 
sistent with the fiducial value from Table 1. Of the most 
recent 28 measurements shown in Figure [8] (these are those 
that contribute to the last 2 points in the Qa accuracy plot. 
Figure [12] in Section 3.4), only 2 are more than 1 er from 
the "correct" value. The sum of values when we compare 
to the SIa from Table 1 per data point is 22.7 for these 28 
measurements, which does not sound very small. However, 
this includes the measurement of Cabre et al. , (2006), which 
is 4.0 er from the Table 1. value. Without this outlier, the 
per data point is only 0.26. This could be a signature of over- 
estimation of the error bar size, or perhaps of "confirmation 
bias" . We will return to this in Section 4. 

The final parameter for which we examine the individ- 
ual measurements is the dark energy equation of state pa- 
rameter wq , which we show in Figure (9] In this case there 
are no measurements or limits before the SN measurements 
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Figure 6. Individual published values of the amplitude of mass fluctuations, erg as a function of year. We show one sigma error bars on 
the points, and the point colour (shown in the legend) denotes the technique used to make the measurement (see section 2.2 for more 
details). 
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Figure 7. Individual published values of the baryon density as a fraction of the critical density, f2b, as a function of year. We show one 
sigma error bars on the points, and the point colour (shown in the legend) denotes the technique used to make the measurement (see 
section 2.2 for more details). 1 (2) sigma upper and lower limits are shown using single (double) arrows. 



of the acceleration of the Universe in the late 1990s (Perl- 
mutter et al. 1999, Riess et al. 1998). At around the time 
of WMAPl the first measurements rather than limits on wo 
started to be published, and since then SN have continued 
to be the most popular probe of this parameter. A trend 
more apparent in this more recently measured parameter is 
the large number of points from "combined" measurements. 



Although one could argue that mq is not at present as well 
known as some of the other parameters, we have plotted the 
fiducial value on this graph as luo = 1 exactly. All measure- 
ments since 2004 (bar 2) are consistent with this value at 
the 1 cr level. 
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Figure 8. Individual published values of the vacuum density parameter SIa as a function of year. We show one sigma error bars on 
the points, and the point colour (shown in the legend) denotes the technique used to make the measurement (see Section 2.2 for more 
details). 1 (2) sigma upper and lower limits arc shown using single (double) arrows. 
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Figure 9. Individual published values of the dark energy equation of state parameter uiq as a function of year. We show one sigma 
error bars on the points, and the point colour (shown in the legend) denotes the technique used to make the measurement (see section 
2.2 for more details). 1 (2) sigma upper and lower limits are shown using single (double) arrows. 



3.3 Precision 

One of the common themes which has emerged in the past 
few years is that we are now in the era of "precision cosmol- 
ogy" . It is instructive to study what the data reveals about 
how we reached this point and what precision is currently 
achievable for the different parameters and using the differ- 
ent techniques. We quantify the precision of measurements 
to be the size of the la error bar as a percentage of the fidu- 



cial (WMAP7) value for each parameter. We have also tried 
using the error bar size as a percentage of the quoted central 
value of each measurement, finding no significant difference 
in our results (except for the case of f^A, for which the lat- 
ter is not a useful way of examining earlier data). In the 
case of the neutrino mass, for which only upper limits 
are available, we have taken the precision to be the limit in 
divided by the value of m„ required for the closure den- 
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sity (i.e. Jim = 1) in neutrinos, which is = 93/i'^eV. For 
Q,k we divide the measurement error by 1.0 as the fiducial 
value of Slfc = 0. Our definition of precision in this case is 
therefore more directly related to the precision on the mea- 
sured distance to the surface of last scattering (see e.g., Hu 
& Dodelson 2002) 

In Figure [10] we show the average precision for each 
parameter as a function of year. We have binned the mea- 
surements into bins of width 4 years, and when computing 
the average precision compute an unweighted mean from the 
measurements in each bin. We show Poisson error bars on 
the mean precision. Apart from ttii,, we have not included 
any upper or lower limits on parameters in this plot, only 
published measurements of values with error bars. 

It is apparent from the general appearance of Figure [Till 
that the precision of most measurements has not increased 
very steeply. The log scale of the y-axis is partly responsible 
for this impression, but even so, of the 12 parameters shown, 
6 have a mean precision in the latest bin which is compatible 
(within la) of that in the earliest bin. It is possible that 
this situation has arisen because of greater understanding 
of the role of possible systematic errors as time has gone 
on. The value of is now known to better than 10% for 
an average measurement, for example, after a long period 
in which the precision did not improve. Currently the most 
precisely known parameters are the curvature 0*, and the 
primordial spectral index n, which are both known to about 
1%. A large group of parameters are currently known to 
about 10% precision, from Q,m (17%), through Sib, S1a,o"8 
and Ha (7%). 

In Figure 1111 we show the precision of measurements 
as a function of the technique used. As many of the tech- 
niques are used to measure several different parameters, it 
is worth bearing in mind that decreases in precision with 
time could be related to the switch to a less well measured 
parameter. We can see that this may indeed be happening 
in some cases, or else that again systematic errors are be- 
ing confronted more as time goes in. We can differentiate 
between these possibilities by considering the averaged ac- 
curacy of measurements, which we do in the next Section. 
For now, we can see that lensing, redshift distortion and 
peculiar velocity measurements have exhibited no improve- 
ment in quoted precision with time. The CMB on the other 
has improved by about an order of magnitude over the 20 
year period, and cluster measurements by about a factor of 
3. Supernova measurements are also more precise now than 
they were in the late 1990s by a factor of 2. 

3.4 Accuracy 

Our assumption that the "correct" values of the different 
cosmological parameter values are available allows us to 
compute a potentially powerful statistic, the accuracy of 
measurements. We define this to be the absolute value of 
the difference between a measured value of a parameter and 
our fidcuial value for that parameter (as listed in Table [l|, 
divided by the quoted Icr error bar for that measurement. 
The accuracy can therefore be written as A^o-, the average 
number of standard deviations measurements are from the 
correct value. We note that for a Normal distribution of er- 
rors, the average value of Ao- = 1. Values smaller than 1 
indicate that the error bars have been overestimated, and 



for values larger than 1 the error bars have been underesti- 
mated. Alternatively, values smaller than 1 may also indicate 
evidence for "confirmation bias", in which values closer to 
the expected ones are favoured (not necessarily consciously). 
We have chosen to use as our statistic rather than the 
as it is more robust to outliers (not being dependent on 
the square of the difference between a measurement and the 
true value). Qualitatively similar conclusions would result 
if we did use the of measurements with respect to the 
"known" values as a measure of accuracy, however. 

When computing the accuracy, one must decide how 
the uncertainty on the true values of the parameters affects 
the results. Two choices which approximately bracket the 
range of potential effects are to either add the la error bars 
on the values in Table 1 in quadrature to the error bars 
on each published measurement, or else to assume no ad- 
ditional uncertainty beyond the quoted error bar for each 
published measurement. We have tried both, finding almost 
imperceptible quantitative differences which do not affect 
any of our conclusions. This can be understood from the 
fact that the error bars in Table 1 are much smaller than 
those on past measurements. In making our plots, we have 
chosen the second option, on the grounds that some of the 
uncertainty from the other measurements has already been 
incorporated into the WMAP7 results we use in Table 1. 

In Figure [12] we show the accuracy for the different pa- 
rameters as a function of year, with the binning by year 
carried out as for the precision (Figure [TIT)) , an unweighted 
average of the measurements in that bin. Poisson error bars 
have been computed as before. In general, one can see that 
the measurements in the different panels are not extremely 
offset from the Ao- = 1 line, indicating that the accuracy 
of parameter determinations has not been wildly off. That 
said, however, the A^ = 1 line is a good fit by eye in only 1 
panel, that for the shape parameter F = fi/i. 

Turning to individual parameters in Figure 1121 we can 
see that the accuracy of measurements of Hq and erg, has 
improved over the last 20 years, so that the most recent 
measurements appear to have realistic error bars. The er- 
ror bars on Q.a appear to be overestimated, as do the most 
recent error bars on Q.m. From this it would appear that 
signficant work sucessfuUy understanding the overall levels 
of measurement uncertainty has been carried out for Hq and 
as, but that this has not happened for some of the other pa- 
rameters. We return to this topic in Section 4.2. 

If the varying accuracy is more tightly related to the 
choice of technique than parameter, then we can expect the 
plot of accuracy for different techniques (Figure llSp to be 
more instructive. Here we can see that there are indeed some 
techniques which have a better track record than others. For 
example the use of peculiar velocities to measure parameters 
has resulted in an underestimate of error bars by a factor of 
roughly 2 on average (there is some improvement in the most 
recent two points, which are consistent at la with Ao- = 
1 ). The CMB and redshift distortions have on the other 
hand proven accurate sources of measurements for the whole 
period. Galaxy clusters were sources of measurements with 
underestimated errors in the 1990s, but in the last 12 years 
have tracked Ao- = 1 very well. 

The most recent 2 points for SN and 3 for BAO appear 
to have overestimated error bars, signficantly so in the case 
of SN (by a factor of 3). SN measurements are those which 
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Figure 10. The quoted precision of measurements as a function of year for our different cosmological parameters. The precision is 
defined to be the size of the la error bar as a percentage of the fiducial parameter value in Table [T] Error bars are Poisson errors 
computed from the number of measurements in each bin. 



most often quote systematic error bars (which we have added 
directly to the statistical errors). We have tried two other 
ways of dealing with the systematic errors, either adding 
them in quadrature, or ignoring them altogether. We find 
that with the latter most conservative treatment, the SN 
resuhs yield iV^ = 0.5 ± 0.07 and 0.77 ± 0.12 for the most 
recent two points. This is an improvement, indicating that 
the SN systematic error bars may well be too conservative. 
It is still an underestimate, but now of similar magnitude to 
the differences seen between the accuracy=l line and some 
data points on the "other" , "combined" and "LSS" panels. If 
we allow for the possibility that the Poisson error bars on our 
data points in Figure [TS] are underestimates, and that there 
may be correlations between measurements in different years 
then this may go some way to reconciling the measurements 
and their hoped for accuracy. We return to this point in our 
discussion below (Section 4.2). 

One question which is not easy to answer from the mul- 
tipanel Figures [12] and [13] is how the overall accuracy of 
measurements is changing by year. Are cosmological mea- 
surements improving as both theoretical knowledge and ex- 
pertise in dealing with experimental uncertainties improve? 
We can see that this does appear to be the case by con- 
sidering Figure [2] which plots accuracy by year for results 
published in the two main journals, MNRAS and ApJ (in- 
cluding ApJL and ApJS). These account for 35% and 55% 
of all results in our compilation, respectively. The results be- 
fore the year ~2003 are significantly inaccurate, but steadily 
improve with time until after this date they become consis- 




1990 
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2000 
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2010 



year 



Figure 14. The accuracy of measurements as a function of year. 
The accuracy is defined to be the difference between the quoted 
measurement and the the fiducial parameter value in Table 1 in 
units of the quoted measurement Icr error bar. Error bars arc 
Poisson errors computed from the number of measurements in 
each bin. We show separately the accuracy for measurements from 
the two journals with the most published measurements. 
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tent with the = 1 line. Both journals exhibit the same 
behaviour within their error bars. 

Using our tabulated data we can explore a few more as- 
pects of the accuracy of measurements. One can ask whether 
when results are pubHshed their accuracy affects the amount 
by which they are cited, and therefore whether recognition 
increases with the perception that measurements are accu- 
rate. We address this in Figure [151 where we plot the accu- 
racy vs. the number of citations to a paper, both on a log 
scale. We can see that there appears to be little evidence 
for any relationship between the two, so that accuracy is 
not an important factor in determining the number of cita- 
tions. Looking at Figure fTSl it does seem that there might be 
slightly less papers with high A^^ (innaccurate) and high ci- 
tations that other corners of the plot. This leads to a Pearson 
correlation coefficient of r = —0.066, and therefore a slight 
correlation between citations and accuracy, in that papers 
with higher accuracy (lower Ncr) have more citations. A set 
of points with no correlation would give such a result 11% 
of the time, so the evidence for this is marginal, however. 

We note that there does exist a significant correlation 
between the precision of measurements and the number of 
citations (not plotted). We find a correlation coefficient of 
r = —0.134 (smaller fractional error results are more cited) 
and probability p = 0.0026 when correlating these two val- 
ues. It is relatively easy to find a possible explanation for 
this, as there is also a correlation between year of publica- 
tion and precision (r = —0.748 , p = 9.7 x 10~^), which 
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Na = 1. 
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Number of a 

Figure 16. The distribution of measurement errors in units of 
the quoted standard deviation. For each measurement we divide 
the difference between the quoted value and the fiducial value 
in Table 1 by the quoted Itr error bar. The results are shown 
as a histogram. We also show the expected curve for a Gaussian 
distribution of errors (smooth line). 



is just due to the overall trend in improving measurements, 
and a correlation between year of publication and citations 
(r = 0.348, p = 0.148). This latter is presumably due to the 
larger number of researchers working in cosmology. Both of 
these trends combine to produce the trend of citations with 
precision. 

A final issue which we address when looking at the ac- 
curacy is the shape of the error distribution. When stating 
that Ncr = 1 is appropriate for an accurate set of measure- 
ments we have made the assumption that all quoted errors 
have a Gaussian distribution. This is an assumption often 
made (although not by all), and is something which we can 
examine using our data, by comparing the number of stan- 
dard deviations that measurements are away from our fidu- 
cial values with the curve for a Gaussian distribution. This 
will tell us for example if there is a long non-Gaussian tail 
to the error distribution. We show the histogram of Na val- 
ues in Figure [16] along with the Gaussian curve. The data is 
fairly similar to the Gaussian curve for the low end of the 
Ncr range where the majority of the data resides, showing 
that in general error bars are only slightly understimated 
(we have seen this already in Figure [141 for example). There 
is however a long tail extending to high values, with some 
measurements being 8 or even 10 a away from their fiducial 
values. Of course with a Gaussian distribution the chance 
of such events occuring would be miniuscule. We can quan- 
tify this further by computing the fraction of measurements 
which are greater than 2a away from the correct value. We 
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Figure 13. The accuracy of measurements as a function of year for the different meaasurement techniques. The accuracy is defined to 
be the difference between the quoted measurement and the the fiducial parameter value in Table 1 in units of the quoted measurement 
la error bar. Error bars are Poisson errors computed from the number of measurements in each bin. The dashed line is the expectation 
for Gaussian statistics, = 1. 



that 19% of measurements are like this, rather than the 5% 
expected for a Gaussian distribution. 



4 SUMMARY AND DISCUSSSION 
4.1 Summary 

We have compiled cosmological parameter measurements 
published between 1990 and 2010 and the techniques used to 
measure them. Using this data we have carried out an analy- 
sis of historical trends in popularity, precision and accuracy. 
The accuracy of past measurements has been estimated by 
assuming that WMAP7 parameter values of Komatsu et al. 
(2011) (combined with ACDM standard values for e.g. wo) 
are the correct ones. Our findings can be summarised as 
follows: 

(1) The number of published measurements for differ- 
ent parameters peaks between 1995 and 2004 for all cases, 
except for wq for which the number was still rising in 2010. 

(2) Of all techniques used to measure the parameters, 
only baryon oscillation and "combined" measurements were 
still rising in terms of publications per year by 2010. 

(3) The quoted precision of measurements has been de- 
clining relatively slowly for most parameters, with several 
(e.g. erg, Ho remaining flat for 10-15 years. 

(4) The accuracy of recent parameter measurements is 
generally what should be expected based on the quoted er- 
ror bars i.e. the error bars overall are neither understimated 



nor overestimated (an accuracy, = 1.0, within the Pois- 
son uncertainty on the measurement). Before 2000, the ac- 
curacy as closer to 2, indicating underestimation of the 
error bars by a factor of 2. Overall, there is a small non- 
Gaussian tail to the error distributions (we find that 20% of 
measurements are more that 2a away from the true values. 

(5) The accuracy of most methods has become consis- 
tent with Ncr = 1.0, with the historically most innaccurate 
parameter measurement technique being the use of galaxy 
peculiar velocities. Measurements of Q,m and particularly 
JIa made since 2000 tend to have accuracy significantly 
less than 1.0, indicating "confirmation bias" and/or an over- 
estimation of error bar sizes. 



4.2 Discussion 

Over the 20 year period covered in this study, it is apparent 
that many of the parameters in what is now the concordance 
CDM cosmological model went from the status of no infor- 
mation or only limits to being known at the 10% level or 
better. It is also apparent from Figure 2 that there was a 
"golden age" of parameter measurements between ~ 1995 
and ~ 2005 during which the number of published mea- 
surements peaked sharply and then declined. This seems to 
indicate that for many purposes (such as the use of a back- 
ground cosmology in galaxy formation models), the precision 
to which the ACDM parameters were known by the time of 
the first WMAP results is sufficient, and many of the rea- 
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sons for pinning down the model better had diminished after 
that. 

This said, however, the exception to this rule, measure- 
ments of Wo (which are still rising in terms of number per 
year at the end of our study) seems to point to a coming 
new era in parameter measurement. Certainly, the motiva- 
tion for the large number of ongoing and future large-scale 
structure, lensing and other surveys is to hunt for the sig- 
natures of dynamical dark energy and modified gravity, and 
given the number of researchers carrying out these studies it 
is likely that measurements will continue to rise. Many pa- 
rameters which we have not catalogued are now within reach 
of quantitiative study. These include the modified gravity 
parameter, Eg (Reyes et al. 2010) and the time derivative 
of the equation of state parameter, Wa- Measurements of 
such parameters involve searching for deviations from the 
concordance CDM model and fall into a different category 
from most of the parameters we have studied in this paper. 
Inflationary parameters such as the non-Gaussianity fNL, or 
tensor to scalar ratio r will pinned down with higher preci- 
sion in the future, and these should also represent a growth 
area. The motivation for most future measurements being 
largely framed in terms of a quest for fundamental physics, 
it would be logical to assume that they will continue un- 
til the cause for the Universe's acceleration are better un- 
derstood. Likewise, parameters describing the dark matter 
particle should be added to this category. 

Possible behaviours for the precision of future param- 
eter measurements can be predicted by looking at the past 
results (Figure [TD}. There is a very wide range, but most 
parameters improve slowly, with a factor of 10 improvement 
in precision over the 20 years representing the extreme (2 
out of 12 parameters). The precision of some parameters 
has remained relatively flat for the whole period, so this 
is a possibility for future so far unconstrained parameters. 
An argument against this slow progress however is the fact 
that many new surveys (such as of Baryon Oscillations) are 
targeted primarily at measures of specific parameters, and 
this aggressive approach (for example including specific pre- 
cisions to be obtained at a given time in survey proposal 
documents) could lead to faster progress. 

Our investigation of the accuracy of results could po- 
tentially lead to some of the most interesting findings. We 
have seen that in the earlier half of our studied time pe- 
riod there is evidence that the error bars were significantly 
underestimated, but that this has changed over time. 

When discussing the accuracy, we are should be aware 
that it was not possible in our analysis to take into account 
several factors which have the potential to affect our conclu- 
sions. For example, we do not keep track of the priors that 
people have assumed in their measurements, and in many 
of the later cases, this may include the WMAP results as 
priors. That this is happening is likely to be responsible for 
much of the post WMAPl tightening of constraints seen in 
Figures |4]- [O] When computing the error bars on the mean 
accuracies of measurements (Figs [12] and I13|) we have used 
Poisson errors based on the number of measurements in each 
bin. This will tend to underestimate the uncertainty on the 
accuracy because some of those measurements could be us- 
ing the same underlying data, or be using similar priors, or 
a combination of the two. There will therefore be correla- 
tions between the error bars so that our estimates of the 



accuracy will be affected. Equivalently the chi-square of the 
fiducial result compared to the data points will be incor- 
rectly determined to be low because of the correlations are 
not included. 

Bearing the above points in mind, we return to the pan- 
els in Figures [12] and [13] where the accuracy seems to be sig- 
nificantly below Nc — 1. This is most obvious in the second 
panel (J^a) of Figure \n\ Such as result could be a sign that 
either the error bars have been significantly overestimated, 
or else that researchers have been influenced by prior results 
( "confirmation bias" ) , or a combination of the two. If we re- 
turn to the data points which led to the last two bins of 
panel two of Figure 1121 we find something especially strik- 
ing. Of those 28 measurements, only 2 are more than la 
from the fiducial results of Table 1 . These 28 measurements 
were carried out by approximately 11 separate groups (as 
determined by authorship lists) using several different tech- 
niques. 

This closeness of published results to the "correct" ones 
is somewhat worrying for future measurements. One can in- 
terpret this as coming partly from error bars being over- 
erestimated by cautious cosmologists, for example by in- 
cluding possible systematic errors in the error bars which 
are not actually present to such a large degree, or in a re- 
lated point authors marginalizing over parameters which are 
actually better known than was assumed. We note that in- 
cluding or excluding the actually quoted systematic error 
bars (Section 3.4) has little effect on this result. An addi- 
tional question is why some parameters have < 1 and 
others do not (e.g., erg). The relatively low number statistics 
of our whole dataset preclude us from making any strong 
statements about this issue. If it does partly result from 
confirmation bias, one can also wonder how observers knew 
which value of Qa (for example) would be the "correct" one, 
given that our fiducial (mostly WMAP7) results from Table 
1 were published in 2011. If this bias is present, it is proba- 
bly related to the mean level for Qa resulting from several 
prior measurements. For example in Figure[8]and others, the 
value of the parameter seems to be pretty well determined 
at least by 2003. 

If we look at the techniques which are often associated 
with dark energy measurements, SN and BAO, we can see 
in Figure [13] that these two have low for recent measure- 
ments. Of the 23 measurements which where included in the 
last bins of the SN panel of Figure [T51 only 2 are more than 

1 a from the fiducial result. We note that this fiducial result 
from Table 1 does include BAO measurements, but not SN 
estimates of dark energy. In the case of SN, however, only 
4 measurements of f^A are included in these bins, and only 

2 separate groups of researchers, so that for that subset of 
data, statistical fluctuations may well be responsible for the 
low Ncr seen. If confirmation bias is present, on the other 
hand, one could argue about who is confirming who- cer- 
tainly the first SN results on dark energy predate those from 
BAO and from most other techniques. These sorts of ques- 
tions might be addressed by a more detailed look at the pub- 
lished measurements, including details of priors, jointly used 
datasets and analysis techniques. Then again small number 
statistics probably would not allow firm conclusions to be 
drawn. These hints should instead serve as a warning that 
care and perhaps concrete steps be taken to avoid any con- 
firmation bias in the future. 
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In conclusion, we have seen that huge progress has been 
made in the 20 year period covered by our study. Important 
questions have been resolved (e.g., is the Universe open?, 
do massive neutrinos contribute substantially to the dark 
matter density?), a model has been found which agrees with 
essentially all observational data so far (ACDM), and the 
parameters of that model have been pinned down at the 
1 — 10% level. The first WMAP results (e.g. as presented in 
Spergel et al. (2003)) form a watershed which is easy to pick 
up in most plots of parameters with time, and serves as a 
reminder that statistically measurable progress is not always 
gradual. Perhaps the most interesting aspect of our study, 
of the accuracy of past results compared to our most re- 
cent knowledge has found that understanding of systematic 
errors and uncertainties in cosmological measurements has 
demonstrably improved since the early 1990s. On average, 
results in the last 10 years are consistent with expectations, 
given their error bars, something which should instill con- 
fidence in future measurements. There are some signs that 
recent measurements of dark energy parameters are closer 
to the "expected" values for ACDM than statistically likely. 
These may be explainable by correlations between measure- 
ments which we have not included. On the other hand this 
may serve as a sign that as cosmology collaboration sizes 
increase carrying out more blind analyses (as in particle 
physics) may be a good idea. 
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